Day 17 - Regular expressions - Groups
58
matches an uppercase letter followed by a digit (grouped), a dash, another uppercase letter, and the
same digit that matched in the group. A string like R2-D3 would not match this regular expression,
because the digit 3 doesn’t match the previous digit. Or maybe it won’t match because the processing
engine has a Star Wars lore checker inside!
A final interesting feature of groups is that they allow to use the OR logical operator at a local level.
Let’s first have a look at the operator in a standard regular expression without groups. Having read
so far, you clearly understand the following regular expressions
$ cat examples.txt | grep -E "^o"
ostrich
ogre
$ cat examples.txt | grep -E "a$"
gorilla
The first one matches the lines beginning with the letter o, while the second one matches those
ending with a. You can match both at the same time with the logical OR represented by a pipe |
$ cat examples.txt | grep -E "^o|a$"
ostrich
gorilla
ogre
Don’t be confused by the use of the pipe symbol. In a regular expression this character doesn’t have
the meaning it has on the command line, that is to connect commands, it just represents a logical
OR. It is a powerful tool, as it allows you to run multiple unrelated regular expressions at the same
time, without forcing you to split them into several executions of grep or any other tool.
So far, though, the operator can only separate two whole expressions. Groups allow you to use the
logical OR at a local level, as you can see in this example
$ cat examples.txt | grep -E "[A-Z]([a-z]|[0-9]-)"
Dug the Dog
Police 101
R2-D2
Johnny 5
Spider-Man [*]
Cyborg 009
Big Bad Wolf
* TM Sony Pictures
The regular expression matches an uppercase letter, followed by either a lowercase letter (Du, Cy) or
a digit and a dash (R2-). As I want the dash to follow only the digit, if present, this condition would